DCIR05 Keerti Chalasani 12/7/19

Round 1: Food Inspection

Abstract:

I performed an analysis of the Chicago Food Inspection data set. I specifically wanted to look at schools to try and understand the condition of the schools that passed the inspection. I did this by mapping the risk level of schools and then performing sentiment analysis on the dataset.

Introduction:

The Chicago food inspection data set includes data from the year 2010 to 2019. Some variables included in this dataset are the inspection id, dba name, aka name, license #, facility type, risk, zip, inspection date, inspection type, results, violations, latitude, and longitude. In this analysis, I’ll be focusing on the inspection date, violations, risk, and facility type. The dataset has 187787 rows and 13 columns. Looking at the data set I realized that there were a lot of schools represented. I was curious to see how healthy schools were considering it’s very important for areas that hold a lot of children to be clean. I decided to see if schools that passed the inspection were still a risk to the public. Do schools that pass the inspection have a lower risk to the public? Are the sentiment values of the violation text positive? Do the sentiment values change over time?

rm(list=ls())
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(SentimentAnalysis)
## 
## Attaching package: 'SentimentAnalysis'
## The following object is masked from 'package:base':
## 
##     write
library(sentimentr)
library(dplyr)
library(ggplot2)
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(RColorBrewer)
library(patchwork)
library(here)
## here() starts at /Users/kc/Desktop
food <- read_csv("https://uofi.box.com/shared/static/5637axblfhajotail80yw7j2s4r27hxd.csv", 
    col_types = cols(Address = col_skip(), 
        `Census Tracts` = col_skip(), City = col_skip(), 
        `Community Areas` = col_skip(), `Historical Wards 2003-2015` = col_skip(), 
        `Inspection Date` = col_date(format = "%m/%d/%Y"), 
        Location = col_skip(), State = col_skip(), 
        Wards = col_skip(), `Zip Codes` = col_skip()))


dim(food)
## [1] 187787     13
colnames(food) <- tolower(colnames(food))
head(food)
## # A tibble: 6 x 13
##   `inspection id` `dba name` `aka name` `license #` `facility type` risk    zip
##             <dbl> <chr>      <chr>            <dbl> <chr>           <chr> <dbl>
## 1         2290733 POLLERIA … POLLERIA …     2428612 Poultry Slaugh… Risk… 60608
## 2         2290799 LA FAMILI… LA FAMILI…     2665437 Grocery Store   Risk… 60629
## 3         2290743 RED COACH… RED COACH…       45131 Restaurant      Risk… 60629
## 4         2290780 UVA KITCH… UVA KITCH…     2647067 Restaurant      Risk… 60640
## 5         2290770 TACO BELL  TACO BELL      2670614 Restaurant      Risk… 60624
## 6         2290739 ARCHIES    ARCHIES        2636959 Restaurant      Risk… 60626
## # … with 6 more variables: `inspection date` <date>, `inspection type` <chr>,
## #   results <chr>, violations <chr>, latitude <dbl>, longitude <dbl>
chi_bb <- c(left = -87.936287,
            bottom = 41.679835,
            right = -87.447052,
            top = 42.000835)

chicago <- get_stamenmap(bbox = chi_bb,
                                zoom = 12)
## 42 tiles needed, this may take a while (try a smaller zoom).
## Source : http://tile.stamen.com/terrain/12/1047/1520.png
## Source : http://tile.stamen.com/terrain/12/1048/1520.png
## Source : http://tile.stamen.com/terrain/12/1049/1520.png
## Source : http://tile.stamen.com/terrain/12/1050/1520.png
## Source : http://tile.stamen.com/terrain/12/1051/1520.png
## Source : http://tile.stamen.com/terrain/12/1052/1520.png
## Source : http://tile.stamen.com/terrain/12/1053/1520.png
## Source : http://tile.stamen.com/terrain/12/1047/1521.png
## Source : http://tile.stamen.com/terrain/12/1048/1521.png
## Source : http://tile.stamen.com/terrain/12/1049/1521.png
## Source : http://tile.stamen.com/terrain/12/1050/1521.png
## Source : http://tile.stamen.com/terrain/12/1051/1521.png
## Source : http://tile.stamen.com/terrain/12/1052/1521.png
## Source : http://tile.stamen.com/terrain/12/1053/1521.png
## Source : http://tile.stamen.com/terrain/12/1047/1522.png
## Source : http://tile.stamen.com/terrain/12/1048/1522.png
## Source : http://tile.stamen.com/terrain/12/1049/1522.png
## Source : http://tile.stamen.com/terrain/12/1050/1522.png
## Source : http://tile.stamen.com/terrain/12/1051/1522.png
## Source : http://tile.stamen.com/terrain/12/1052/1522.png
## Source : http://tile.stamen.com/terrain/12/1053/1522.png
## Source : http://tile.stamen.com/terrain/12/1047/1523.png
## Source : http://tile.stamen.com/terrain/12/1048/1523.png
## Source : http://tile.stamen.com/terrain/12/1049/1523.png
## Source : http://tile.stamen.com/terrain/12/1050/1523.png
## Source : http://tile.stamen.com/terrain/12/1051/1523.png
## Source : http://tile.stamen.com/terrain/12/1052/1523.png
## Source : http://tile.stamen.com/terrain/12/1053/1523.png
## Source : http://tile.stamen.com/terrain/12/1047/1524.png
## Source : http://tile.stamen.com/terrain/12/1048/1524.png
## Source : http://tile.stamen.com/terrain/12/1049/1524.png
## Source : http://tile.stamen.com/terrain/12/1050/1524.png
## Source : http://tile.stamen.com/terrain/12/1051/1524.png
## Source : http://tile.stamen.com/terrain/12/1052/1524.png
## Source : http://tile.stamen.com/terrain/12/1053/1524.png
## Source : http://tile.stamen.com/terrain/12/1047/1525.png
## Source : http://tile.stamen.com/terrain/12/1048/1525.png
## Source : http://tile.stamen.com/terrain/12/1049/1525.png
## Source : http://tile.stamen.com/terrain/12/1050/1525.png
## Source : http://tile.stamen.com/terrain/12/1051/1525.png
## Source : http://tile.stamen.com/terrain/12/1052/1525.png
## Source : http://tile.stamen.com/terrain/12/1053/1525.png
schools <- food %>% filter(food$`facility type` == "School", food$results =="Pass", risk != "All", risk != "")
ggmap(chicago) + geom_point(data = schools, aes(longitude, latitude, color = factor(risk)), size = .55) + labs(title = "Risk Level for Schools", x = "Longitude", y = "Latitude")
## Warning: Removed 570 rows containing missing values (geom_point).

## Analysis:

This image shows the schools that passed the inspection labeled by risk. In this image, there are many orange dots all around Chicago meaning that although the school’s passed the inspection they are still considered to have a high health risk for the general population. This is shocking to see because I would assume that if the schools had passed the inspection they would be considered healthy and safe, instead, they are still posing a health risk.

school_pass = subset(food, food$results == "Pass",food$`facility type`=="School", select=c(violations,risk,`inspection date`,longitude, latitude))
## Warning in if (drop) {: the condition has length > 1 and only the first element
## will be used
school_pass = drop_na(school_pass)
mysample <- school_pass[sample(1:nrow(school_pass), 700),]
calculate_sentiment <- function(x){
  violations_sentiments <- sentiment(x)
  return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}


sentiment_school = lapply(mysample$violations, calculate_sentiment)
sentiment_vector = unlist(sentiment_school)
#is.vector(sentiment_vector)

mysample$sentiment = sentiment_vector

I then calculated the sentiment values for each violation text for schools that passed. I was only able to run this on a sample of 700 since my computer could not handle more data than that and my R studio kept crashing.

summary(mysample$sentiment)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.41079  0.08889  0.17689  0.18050  0.27003  0.78262

I then did a summary of the sentiment values and found that the median sentiment is .1818 which shows that overall most of the violation text sounds positive but the value is still very low. Possibly meaning that although it sounds positive it still does not sound great.

ggmap(chicago) + geom_point(data = mysample, aes(longitude, latitude, color = sentiment), size = .9, ) + labs(title = "Sentiment Values for Schools", x = "Longitude", y = "Latitude") + scale_colour_gradient(low="coral", high="steelblue")
## Warning: Removed 36 rows containing missing values (geom_point).

I then mapped the sentiment values to visually see if there were more positive values than negative values. According to this graph, most of the sentiments are in a more orange hue than a blue hue. this indicates that the violation text for schools that passed still sounds negative.

ggplot(data = mysample, aes( x = `inspection date` , y = sentiment)) +
  geom_line(color = "steelblue") + ggtitle("Sentiment Values Over Time") 

Out of curiosity, I wanted to see if the sentiment values for the school’s changed over time. I plotted the sentiments over the inspection dates from 2010 to 2019 and found that the sentiments do not really change over time.

Conclusion:

In conclusion, the analysis showed that although schools pass the health inspection they are still considered risky and unsafe. By mapping the schools in Chicago based on risk value it showed that most of the schools were considered high risk while very few of them were considered low and medium risk. After performing sentiment analysis on the violation text of schools that passed I found that even though the school passed the inspection the average sentiment value was still very low at .1745. This shows that although a school passed the inspection the person conducting the inspection did not have that many positive comments to be said about the school. After mapping the sentiments over the area of Chicago it is shown that most of the sentiments fall under an orange color meaning they are closer to zero or negative. The analysis of the sentiment values over time do not show a pattern. The values are extremely volatile and do not really change over time. I believe that this means that schools are still not that healthy for children to be in. It might be time to find different inspections to ensure that schools that pass inspections are still healthy.

Round 2: Ok Cupid

Abstract:

This dataset contains the profile data for OkCupid users in the city of San Francisco. The dataset consists of 59,946 records. The dataset has many different variables including 10 essay questions which is what I will be focusing on in this report.

Introduction:

I feel like a common understanding/misconception when it comes to dating is that women tend to look for more love, affection, and commitment than men. Today I will be focusing on the OkCupid data for all the users in the city of San Francisco and comparing essay responses between men and women. I am curious to see if women sound more positive in their essays than men and if women use more words like ‘love’ and ‘commitment’. I will be focusing specifically on essay0 for both men and women from all age groups. I will be performing sentiment analysis on the essays and then creating a couple of plots of the most frequent words used.

Some of the questions I will be asking include : Is there a more positive sentiment for the essay 0 questions based on the sex of the user? Do women say the word ‘love’ more in their essays?

oc <- read_csv("https://uofi.box.com/shared/static/oy32nc373w4jqz3kummksnw6wvhfrl7a.csv", 
    col_types = cols(last_online = col_datetime(format = "%Y-%m-%d-%H-%M"))) 
head(oc)
## # A tibble: 6 x 31
##     age body_type diet  drinks drugs education essay0 essay1 essay2 essay3
##   <dbl> <chr>     <chr> <chr>  <chr> <chr>     <chr>  <chr>  <chr>  <chr> 
## 1    22 a little… stri… socia… never working … "abou… "curr… "maki… "the …
## 2    35 average   most… often  some… working … "i am… dedic… "bein… <NA>  
## 3    38 thin      anyt… socia… <NA>  graduate… "i'm … "i ma… "impr… "my l…
## 4    23 thin      vege… socia… <NA>  working … i wor… readi… "play… socia…
## 5    29 athletic  <NA>  socia… never graduate… "hey … work … "crea… i smi…
## 6    29 average   most… socia… <NA>  graduate… "i'm … "buil… "imag… "i ha…
## # … with 21 more variables: essay4 <chr>, essay5 <chr>, essay6 <chr>,
## #   essay7 <chr>, essay8 <chr>, essay9 <chr>, ethnicity <chr>, height <dbl>,
## #   income <dbl>, job <chr>, last_online <dttm>, location <chr>,
## #   offspring <chr>, orientation <chr>, pets <chr>, religion <chr>, sex <chr>,
## #   sign <chr>, smokes <chr>, speaks <chr>, status <chr>
colnames(oc) <- tolower(colnames(oc))

Analysis

women = subset(oc, oc$sex == "f", select=c(sex, essay0))
women = drop_na(women)
mysample <- women[sample(1:nrow(women), 500),]
calculate_sentiment <- function(x){
  violations_sentiments <- sentiment(x)
  return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}

women_sentiment0 = lapply(mysample$essay0, calculate_sentiment)
women_sent = unlist(women_sentiment0)
summary(women_sent)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.4773  0.1447  0.2650  0.2919  0.4159  1.7133      11

The average sentiment for women in essay one is .2857.

set.seed(2019)
si <- sample(1:nrow(women),20) #random sample of 20 rows
library(tm)
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
e8 <- data.frame(doc_id=si,text=women$essay0[si],stringsAsFactors = FALSE)
corpus <- VCorpus(DataframeSource(e8))
tryTolower <- function(x){
y = NA
try_error = tryCatch(tolower(x), error = function(e) e)
if (!inherits(try_error, 'error'))
y = tolower(x)
return(y)
}
clean.corpus<-function(corpus){
corpus <- tm_map(corpus, content_transformer(tryTolower))
corpus <- tm_map(corpus, removeWords, stopwords('english'))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeNumbers)
return(corpus)
}
newcorpus <- clean.corpus(corpus)
tdm<-TermDocumentMatrix(newcorpus, control=list(weighting=weightTf))
tdm.essay0 <- as.matrix(tdm)
sfq <- data.frame(words=names(sort(rowSums(tdm.essay0),decreasing = TRUE)), freqs=sort(rowSums(tdm.essay0),decreasing = TRUE), row.names = NULL)
sfq
##                                 words freqs
## 1                                love    25
## 2                             someone    18
## 3                                like    17
## 4                                life    16
## 5                                want    14
## 6                                time    13
## 7                             looking    12
## 8                                play    12
## 9                                 new    10
## 10                                can     9
## 11                             family     9
## 12                            friends     9
## 13                               just     9
## 14                               know     9
## 15                               meet     9
## 16                             people     9
## 17                             pretty     9
## 18                             really     9
## 19                              years     9
## 20                         classilink     8
## 21                              enjoy     7
## 22                                fun     7
## 23                              right     7
## 24                             things     7
## 25                               nice     6
## 26                               open     6
## 27                             person     6
## 28                              still     6
## 29                               sure     6
## 30                               food     5
## 31                              going     5
## 32                               hard     5
## 33                               href     5
## 34                                now     5
## 35                                one     5
## 36                                way     5
## 37                               work     5
## 38                              world     5
## 39                               year     5
## 40                         adventures     4
## 41                             almost     4
## 42                               also     4
## 43                           anything     4
## 44                               area     4
## 45                                bay     4
## 46                                big     4
## 47                                day     4
## 48                          exploring     4
## 49                            feeling     4
## 50                               find     4
## 51                              first     4
## 52                              games     4
## 53                               good     4
## 54                              happy     4
## 55                              heart     4
## 56                               make     4
## 57                             rather     4
## 58                       relationship     4
## 59                              since     4
## 60                            working     4
## 61                           although     3
## 62                             around     3
## 63                               away     3
## 64                               back     3
## 65                               best     3
## 66                               born     3
## 67                              comes     3
## 68                      communication     3
## 69                        connections     3
## 70                               even     3
## 71                               ever     3
## 72                              feels     3
## 73                               feet     3
## 74                              great     3
## 75                            helping     3
## 76                               home     3
## 77                          important     3
## 78                         interested     3
## 79                          involving     3
## 80                               kind     3
## 81                              laugh     3
## 82                            learned     3
## 83                           learning     3
## 84                                lot     3
## 85                             loving     3
## 86                             making     3
## 87                            married     3
## 88                               much     3
## 89                              music     3
## 90                                old     3
## 91                             others     3
## 92                         passionate     3
## 93                               past     3
## 94                           personal     3
## 95                             raised     3
## 96                                san     3
## 97                          sarcastic     3
## 98                                say     3
## 99                               self     3
## 100                             spent     3
## 101                           usually     3
## 102                              will     3
## 103                               ago     2
## 104                             alone     2
## 105                            always     2
## 106                               amp     2
## 107                              avid     2
## 108                           awesome     2
## 109                           awkward     2
## 110                               bit     2
## 111                             butch     2
## 112                              came     2
## 113                          canadian     2
## 114                           chicago     2
## 115                             class     2
## 116                             coast     2
## 117                              come     2
## 118                       comfortable     2
## 119                         confident     2
## 120                         connected     2
## 121                          consider     2
## 122                           control     2
## 123                         currently     2
## 124                             dates     2
## 125                            dating     2
## 126                             drums     2
## 127                           earlier     2
## 128                             earth     2
## 129                              east     2
## 130                          elements     2
## 131                        especially     2
## 132                             ethic     2
## 133                       experiences     2
## 134                           explore     2
## 135                         extremely     2
## 136                           finding     2
## 137                              five     2
## 138                           forward     2
## 139                         francisco     2
## 140                              free     2
## 141                            friend     2
## 142                           general     2
## 143                          generous     2
## 144                           getting     2
## 145                              girl     2
## 146                        girlfriend     2
## 147                               got     2
## 148                              grad     2
## 149                             hello     2
## 150                            hiking     2
## 151                              hope     2
## 152                         hopefully     2
## 153                             humor     2
## 154                          identity     2
## 155                             ilink     2
## 156                       independent     2
## 157                           instead     2
## 158                         interests     2
## 159                               job     2
## 160                           journey     2
## 161                           knowing     2
## 162                              laid     2
## 163                              last     2
## 164                               let     2
## 165                              live     2
## 166                             lived     2
## 167                              long     2
## 168                             looks     2
## 169                               low     2
## 170                             makes     2
## 171                              many     2
## 172                               may     2
## 173                           midwest     2
## 174                              name     2
## 175                             never     2
## 176                            occupy     2
## 177                           outside     2
## 178                           overall     2
## 179                              park     2
## 180                           partner     2
## 181                           passion     2
## 182                             place     2
## 183                              poly     2
## 184                             proud     2
## 185                               put     2
## 186                             queer     2
## 187                              real     2
## 188                     relationships     2
## 189                        restaurant     2
## 190                              rock     2
## 191                            school     2
## 192                          schoolbr     2
## 193                             sense     2
## 194                               sex     2
## 195                            sexual     2
## 196                             silly     2
## 197                             small     2
## 198                             smart     2
## 199                         something     2
## 200                         sometimes     2
## 201                           special     2
## 202                          spending     2
## 203                             state     2
## 204                             stuff     2
## 205                             sweet     2
## 206                              take     2
## 207                             taken     2
## 208                            taking     2
## 209                           talking     2
## 210                           tattoos     2
## 211                             think     2
## 212                             trees     2
## 213                             value     2
## 214                             video     2
## 215                              walk     2
## 216                             water     2
## 217                              well     2
## 218                             witty     2
## 219                             woman     2
## 220                               yet     2
## 221                              able     1
## 222                            abroad     1
## 223                         abundance     1
## 224                       accessorize     1
## 225                            across     1
## 226                        activities     1
## 227                          actually     1
## 228                               add     1
## 229                        adrenaline     1
## 230                         adventure     1
## 231                      adventuresbr     1
## 232                         agressive     1
## 233                               air     1
## 234                           aligned     1
## 235                           allllll     1
## 236                           amazing     1
## 237                           america     1
## 238                          american     1
## 239                             among     1
## 240                             amuse     1
## 241                            andrea     1
## 242                           anybody     1
## 243                        apparently     1
## 244                        appreciate     1
## 245                      appreciative     1
## 246                         arbitrary     1
## 247                       argentinean     1
## 248                               arm     1
## 249                              arts     1
## 250                          asidesbr     1
## 251                         atrocious     1
## 252                         attention     1
## 253                            august     1
## 254                            babies     1
## 255                               bag     1
## 256                           balance     1
## 257                              band     1
## 258                            banter     1
## 259                               bar     1
## 260                              bart     1
## 261                             bawdy     1
## 262                           beachbr     1
## 263                         beautiful     1
## 264                             begin     1
## 265                           believe     1
## 266                            better     1
## 267                         bicyclist     1
## 268                              bike     1
## 269                             bills     1
## 270                       bittersweet     1
## 271                         bizarrebr     1
## 272                             block     1
## 273                             blond     1
## 274                             board     1
## 275                              body     1
## 276                            bother     1
## 277                             bound     1
## 278                           bowling     1
## 279                         boyfriend     1
## 280                             brain     1
## 281                          breaking     1
## 282                           breathe     1
## 283                            bright     1
## 284                             broad     1
## 285                      broadcasting     1
## 286                          brooklyn     1
## 287                            brunch     1
## 288                          building     1
## 289                          business     1
## 290                        california     1
## 291                            camera     1
## 292                            camper     1
## 293                           camping     1
## 294                               car     1
## 295                              care     1
## 296                            career     1
## 297                            caring     1
## 298                           carving     1
## 299                            casual     1
## 300                           certain     1
## 301                         challenge     1
## 302                           changes     1
## 303                           charmer     1
## 304                             chase     1
## 305                              chat     1
## 306                             check     1
## 307                         chemistry     1
## 308                             child     1
## 309                         childhood     1
## 310                          choicebr     1
## 311                            choose     1
## 312                            circle     1
## 313                              city     1
## 314                            citybr     1
## 315                            clever     1
## 316                           closely     1
## 317                            closer     1
## 318                           clothes     1
## 319                          clothing     1
## 320                        clothingbr     1
## 321                           coaster     1
## 322                            coffee     1
## 323                           college     1
## 324                         collegebr     1
## 325                            comedy     1
## 326                        commenting     1
## 327                        commitment     1
## 328                         committed     1
## 329                       communicate     1
## 330                           company     1
## 331                        composting     1
## 332                          computer     1
## 333                     configuration     1
## 334                        connecting     1
## 335                        connection     1
## 336                       considerate     1
## 337                        constantly     1
## 338                      conversation     1
## 339                           cooking     1
## 340                              cool     1
## 341                         coparents     1
## 342                            couple     1
## 343                            course     1
## 344                         courtship     1
## 345                         coworkers     1
## 346                          creating     1
## 347                          creative     1
## 348                           crucial     1
## 349                           culture     1
## 350                             curls     1
## 351                             curvy     1
## 352                           custody     1
## 353                               cut     1
## 354                             dadbr     1
## 355                           dancing     1
## 356                              dark     1
## 357                           dashing     1
## 358                            dayday     1
## 359                              days     1
## 360                      dealbreakers     1
## 361                              dear     1
## 362                            deeply     1
## 363                        definitely     1
## 364                       definitions     1
## 365                          describe     1
## 366                            design     1
## 367                         designing     1
## 368                           details     1
## 369                      developments     1
## 370                         different     1
## 371                            disney     1
## 372                          dominant     1
## 373                             dorky     1
## 374                              drug     1
## 375                               dry     1
## 376                              dyke     1
## 377                             early     1
## 378                            earned     1
## 379                              ears     1
## 380                              easy     1
## 381                            eating     1
## 382                        electronic     1
## 383                              else     1
## 384                         emotional     1
## 385                       endeavoring     1
## 386                         energizes     1
## 387                            energy     1
## 388                           enjoyed     1
## 389                         enjoyment     1
## 390                            enough     1
## 391                               era     1
## 392                             estsy     1
## 393                               etc     1
## 394                            everbr     1
## 395                          everyday     1
## 396                          everyone     1
## 397                        everything     1
## 398                        everywhere     1
## 399                          exciting     1
## 400                            exeast     1
## 401                         exemplary     1
## 402                         exhusband     1
## 403                             exnew     1
## 404                      expectations     1
## 405                        experience     1
## 406                          explorer     1
## 407                            extent     1
## 408                         extrovert     1
## 409                       extroverted     1
## 410                           factsbr     1
## 411                              fall     1
## 412                              fast     1
## 413                            father     1
## 414                          favorite     1
## 415                              feel     1
## 416                              felt     1
## 417                             femme     1
## 418                             field     1
## 419                           fifteen     1
## 420                            figure     1
## 421                            filled     1
## 422                           finally     1
## 423                           firstbr     1
## 424                               fit     1
## 425                          fixation     1
## 426                            flaunt     1
## 427                             fluid     1
## 428                          fluidity     1
## 429                            foodie     1
## 430                           foreign     1
## 431                          foremost     1
## 432                       fosteringbr     1
## 433                            france     1
## 434                       franciscobr     1
## 435                            french     1
## 436                             fresh     1
## 437                          friendly     1
## 438                            fringe     1
## 439                        frustrated     1
## 440                        fullybaked     1
## 441                             funny     1
## 442                           furious     1
## 443                         furniture     1
## 444                            gamebr     1
## 445                            garden     1
## 446                               gas     1
## 447                            geekbr     1
## 448                       genderqueer     1
## 449                           genghis     1
## 450                            genres     1
## 451                               get     1
## 452                              gets     1
## 453                             ghost     1
## 454                            girlbr     1
## 455                              gone     1
## 456                          goodbyes     1
## 457                          graduate     1
## 458                           grammar     1
## 459                              grew     1
## 460                           growing     1
## 461                        guaranteed     1
## 462                             guess     1
## 463                              guys     1
## 464                             hands     1
## 465                         happiness     1
## 466                          hardware     1
## 467                       hardworking     1
## 468                             harry     1
## 469                              hate     1
## 470                           healthy     1
## 471                           hearted     1
## 472                              hell     1
## 473                               hey     1
## 474                              hide     1
## 475                             hikes     1
## 476                           hippies     1
## 477                              hips     1
## 478                              hold     1
## 479                            honest     1
## 480                          honestly     1
## 481                           hoodies     1
## 482                               hot     1
## 483                             hours     1
## 484                           however     1
## 485          hrefinterestsbadjokesbad     1
## 486         hrefinterestsdesigndesign     1
## 487           hrefinterestseameseames     1
## 488         hrefinterestsguitarguitar     1
## 489         hrefinterestsviolinviolin     1
## 490                              hugs     1
## 491                            humble     1
## 492                            huuuge     1
## 493                     hypergendered     1
## 494                              idea     1
## 495                          idealsbr     1
## 496                            impact     1
## 497                         imperfect     1
## 498                            income     1
## 499                        incredibly     1
## 500                        indulgence     1
## 501                       indulgences     1
## 502                       inquisitive     1
## 503                       intentional     1
## 504                       interacting     1
## 505                       interesting     1
## 506 interestsarchitecturearchitecture     1
## 507               interestsbikesbikes     1
## 508           interestscampingcamping     1
## 509                 interestsmuirmuir     1
## 510    interestsvintageclothesvintage     1
## 511                         intuitive     1
## 512                         invention     1
## 513                          involved     1
## 514                        irritating     1
## 515                              jobs     1
## 516                              joke     1
## 517                             jokes     1
## 518                               joy     1
## 519                            judged     1
## 520                           junkybr     1
## 521                             khunt     1
## 522                              kids     1
## 523                          kindness     1
## 524                          laidback     1
## 525                          language     1
## 526                            latest     1
## 527                          laughing     1
## 528                          laughter     1
## 529                               law     1
## 530                             leaps     1
## 531                             learn     1
## 532                              left     1
## 533                            lifebr     1
## 534                         lifestyle     1
## 535                          lifetime     1
## 536                            likely     1
## 537                             likes     1
## 538                            little     1
## 539                             lives     1
## 540                            living     1
## 541                           locally     1
## 542                            london     1
## 543                            lovers     1
## 544                             loves     1
## 545                             loyal     1
## 546                              made     1
## 547                       maintenance     1
## 548                               man     1
## 549                          marriage     1
## 550                       masstransit     1
## 551                           matters     1
## 552                             maybe     1
## 553                             means     1
## 554                           medical     1
## 555                           meeting     1
## 556                            mellow     1
## 557                         mentioned     1
## 558                            mexico     1
## 559                              mids     1
## 560                          migrated     1
## 561                              milf     1
## 562                             mills     1
## 563                           mindful     1
## 564                             money     1
## 565                             moved     1
## 566                             movie     1
## 567                            movies     1
## 568                          moviesbr     1
## 569                        naturelove     1
## 570                         naviating     1
## 571                            nearly     1
## 572                            neatly     1
## 573                              need     1
## 574                              news     1
## 575                            nights     1
## 576                           nothing     1
## 577                            notice     1
## 578                          numerous     1
## 579                               nyc     1
## 580                             nymag     1
## 581                          obligebr     1
## 582                           obscure     1
## 583                          obsessed     1
## 584                         obsessing     1
## 585                             older     1
## 586                            online     1
## 587                     opportunities     1
## 588                       opportunity     1
## 589                            option     1
## 590                          oriented     1
## 591                        originally     1
## 592                         otherwise     1
## 593                           outdoor     1
## 594                          outdoors     1
## 595                        outdoorsbr     1
## 596                           outlook     1
## 597                         outsidebr     1
## 598                            overly     1
## 599                         paramount     1
## 600                     parenthetical     1
## 601                      particularly     1
## 602                          partners     1
## 603                             parts     1
## 604                          partying     1
## 605                              pass     1
## 606                          passions     1
## 607                            pathbr     1
## 608                               pay     1
## 609                          peoplebr     1
## 610                       performance     1
## 611                           perhaps     1
## 612                           persons     1
## 613                       perspective     1
## 614                             phone     1
## 615                        physically     1
## 616                               pic     1
## 617                           picnics     1
## 618                           picture     1
## 619                            places     1
## 620                            playbr     1
## 621                           playing     1
## 622                            please     1
## 623                          pleasure     1
## 624                             point     1
## 625                          pointing     1
## 626                       polyamorous     1
## 627                            polybr     1
## 628                     polysaturated     1
## 629                              posh     1
## 630                          positive     1
## 631                          possible     1
## 632                            posted     1
## 633                            potter     1
## 634                             power     1
## 635                            prefer     1
## 636                           problem     1
## 637                           produce     1
## 638                           profile     1
## 639                           program     1
## 640                             prove     1
## 641                        proverbial     1
## 642                          provided     1
## 643                           pushing     1
## 644                            queers     1
## 645                            quests     1
## 646                           quickly     1
## 647                             quite     1
## 648                             quote     1
## 649                            racing     1
## 650                           radical     1
## 651                            random     1
## 652                         randomass     1
## 653                              read     1
## 654                            reader     1
## 655                           reading     1
## 656                         realizing     1
## 657                             realm     1
## 658                            reason     1
## 659                        reasonably     1
## 660                            recent     1
## 661                         reforming     1
## 662                        regimented     1
## 663                          relating     1
## 664                           relaxed     1
## 665                         relocated     1
## 666                            remain     1
## 667                          repartee     1
## 668                           respect     1
## 669                            revamp     1
## 670                              ride     1
## 671                          rightsbr     1
## 672                             rolly     1
## 673                           romance     1
## 674                          romantic     1
## 675                             roots     1
## 676                             rowbr     1
## 677                            rowing     1
## 678                             rules     1
## 679                               run     1
## 680                           running     1
## 681                           rushing     1
## 682                              said     1
## 683                             sales     1
## 684                       sarcasticbr     1
## 685                             sassy     1
## 686                              save     1
## 687                             savvy     1
## 688                              says     1
## 689                    scarcitydriven     1
## 690                               see     1
## 691                            seeing     1
## 692                           seeking     1
## 693                             seeks     1
## 694                   selfexploration     1
## 695                       selfreliant     1
## 696                              sell     1
## 697                           selling     1
## 698                              send     1
## 699                         september     1
## 700                           service     1
## 701                     sexpositivity     1
## 702                          sexually     1
## 703                              sexy     1
## 704                              sfsu     1
## 705                            shaped     1
## 706                             share     1
## 707                           sharing     1
## 708                            shifts     1
## 709                              shit     1
## 710                             shows     1
## 711                               shy     1
## 712                           silence     1
## 713                            simple     1
## 714                              site     1
## 715                               sky     1
## 716                            sleazy     1
## 717                               sly     1
## 718                             smile     1
## 719                            snobby     1
## 720                             solid     1
## 721                             south     1
## 722                          souzabee     1
## 723                             space     1
## 724                             spain     1
## 725                              spar     1
## 726                          spelling     1
## 727                       spiritually     1
## 728                       spontaneity     1
## 729                       spontaneous     1
## 730                             spots     1
## 731                            states     1
## 732                            strong     1
## 733                           student     1
## 734                          studying     1
## 735                           stumble     1
## 736                             suits     1
## 737                          sunlight     1
## 738                             sweep     1
## 739                              swim     1
## 740                        swimmingbr     1
## 741                           switchy     1
## 742                             table     1
## 743                              talk     1
## 744                              tall     1
## 745                          tastings     1
## 746                            taught     1
## 747                           teacher     1
## 748                             teeth     1
## 749                              tell     1
## 750                         temporary     1
## 751                               ten     1
## 752                             terms     1
## 753                             tests     1
## 754                            textbr     1
## 755                            theses     1
## 756                             thing     1
## 757                            though     1
## 758                          thoughbr     1
## 759                            timebr     1
## 760                             times     1
## 761                             tired     1
## 762                       togetherand     1
## 763                          tomorrow     1
## 764                             touch     1
## 765                           tourist     1
## 766                              town     1
## 767                       transported     1
## 768                            travel     1
## 769                          traveled     1
## 770                          tropical     1
## 771                            trucks     1
## 772                          trusting     1
## 773                        truthfully     1
## 774                               try     1
## 775                              turn     1
## 776                               two     1
## 777                             types     1
## 778                        undercover     1
## 779                        understand     1
## 780                     understanding     1
## 781                            upbeat     1
## 782                             urban     1
## 783                               use     1
## 784                              used     1
## 785                           variety     1
## 786                              veto     1
## 787                             views     1
## 788                           vintage     1
## 789                             visit     1
## 790                             vital     1
## 791                        vocabulary     1
## 792                              wake     1
## 793                           walking     1
## 794                         wandering     1
## 795                            wanted     1
## 796                             wants     1
## 797                              warm     1
## 798                            warmth     1
## 799                          watching     1
## 800                             waybr     1
## 801                           weather     1
## 802                           weekend     1
## 803                           welcome     1
## 804                       wellbrought     1
## 805                              went     1
## 806                              west     1
## 807                          whatever     1
## 808                          whenever     1
## 809                             whose     1
## 810                              wide     1
## 811                           winning     1
## 812                             words     1
## 813                             worry     1
## 814                         wrestling     1
## 815                            writer     1
## 816                           writing     1
## 817                               yep     1
## 818                               yes     1
## 819                            yorker     1
## 820                            youths     1
ggplot(sfq[1:20,], mapping = aes(x = reorder(words, freqs), y = freqs)) +
  geom_bar(stat= "identity", fill="#d598a3") +
  coord_flip() +
  scale_colour_hue() +
  labs(x= "Words", title = "20 Most Frequent Words (Essay0 Subset for Women)") +
  theme(panel.background = element_blank(), axis.ticks.x = element_blank(),axis.ticks.y = element_blank())

library(wordcloud)
wordcloud(sfq$words,sfq$freqs, min.freq = 1, max.words = 30, colors="#d598a3") 

According to the plots and word cloud created it does show that women use the word ‘love’ more along with words like ‘family’, ‘someone’, ‘years’, and ‘want’. I believe this shows that women tend to use more words that sound like they are looking for a commitment.

men = subset(oc, oc$sex == "m", select=c(sex, essay0))
men = drop_na(men)
sm <- men[sample(1:nrow(men), 500),]
calculate_sentiment <- function(x){
  violations_sentiments <- sentiment(x)
  return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}
men_sentiment0 = lapply(sm$essay0, calculate_sentiment)
men_sent = unlist(men_sentiment0)
summary(men_sent)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.3361  0.1194  0.2299  0.2531  0.3599  1.4199      17

The average sentiment for men in essay one is .2649.

The average sentiment for men and women is basically the same so I believe that it isn’t necessarily true that women sound more positive than men.

set.seed(2019)
sm <- sample(1:nrow(men),20) #random sample of 20 rows
library(tm)
e8 <- data.frame(doc_id=si,text=men$essay0[sm],stringsAsFactors = FALSE)
corpus <- VCorpus(DataframeSource(e8))
tryTolower <- function(x){
y = NA
try_error = tryCatch(tolower(x), error = function(e) e)
if (!inherits(try_error, 'error'))
y = tolower(x)
return(y)
}
clean.corpus<-function(corpus){
corpus <- tm_map(corpus, content_transformer(tryTolower))
corpus <- tm_map(corpus, removeWords, stopwords('english'))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeNumbers)
return(corpus)
}

newcorpus_m <- clean.corpus(corpus)
tdm_men<-TermDocumentMatrix(newcorpus_m, control=list(weighting=weightTf))
tdm_men.essay0 <- as.matrix(tdm_men)
sfq_men <- data.frame(words=names(sort(rowSums(tdm_men.essay0),decreasing = TRUE)), freqs=sort(rowSums(tdm_men.essay0),decreasing = TRUE), row.names = NULL)
sfq_men
##                   words freqs
## 1                  life    15
## 2                things    10
## 3                  kind     8
## 4               looking     8
## 5                   new     8
## 6                  like     7
## 7                   one     7
## 8                people     7
## 9                   can     6
## 10                 good     6
## 11              someone     6
## 12               always     5
## 13               around     5
## 14              friends     5
## 15                  guy     5
## 16                 meet     5
## 17                  now     5
## 18                  say     5
## 19                 work     5
## 20                 find     4
## 21                  fun     4
## 22                humor     4
## 23                  see     4
## 24                share     4
## 25                 time     4
## 26               trying     4
## 27              working     4
## 28                world     4
## 29               anyone     3
## 30                 best     3
## 31                  bit     3
## 32                enjoy     3
## 33            exploring     3
## 34               friend     3
## 35                going     3
## 36                 last     3
## 37               living     3
## 38                 love     3
## 39                 move     3
## 40              partner     3
## 41               pretty     3
## 42               really     3
## 43                  san     3
## 44                  six     3
## 45                 soon     3
## 46                start     3
## 47                taken     3
## 48                thing     3
## 49                think     3
## 50                  try     3
## 51                 will     3
## 52                 year     3
## 53                 also     2
## 54                 amet     2
## 55              awkward     2
## 56                 back     2
## 57                  bad     2
## 58                  bay     2
## 59              believe     2
## 60                  big     2
## 61                 born     2
## 62             business     2
## 63           california     2
## 64                chill     2
## 65          consectetur     2
## 66            currently     2
## 67                dance     2
## 68           dominating     2
## 69                earth     2
## 70                 east     2
## 71            easygoing     2
## 72             enjoying     2
## 73               family     2
## 74            francisco     2
## 75                 full     2
## 76                great     2
## 77                 grew     2
## 78                group     2
## 79          interesting     2
## 80            interests     2
## 81                ipsum     2
## 82                  job     2
## 83                 just     2
## 84                 keep     2
## 85                 know     2
## 86                knows     2
## 87             learning     2
## 88                lorem     2
## 89                 make     2
## 90                  may     2
## 91              meeting     2
## 92                  met     2
## 93                minds     2
## 94                moved     2
## 95                 much     2
## 96            mushrooms     2
## 97                never     2
## 98                newbr     2
## 99                 nice     2
## 100                open     2
## 101          passionate     2
## 102              places     2
## 103             playing     2
## 104             profile     2
## 105               quiet     2
## 106                quis     2
## 107              raised     2
## 108                read     2
## 109            recently     2
## 110              school     2
## 111                 sit     2
## 112           somewhere     2
## 113           summarize     2
## 114              sweden     2
## 115                take     2
## 116          technology     2
## 117              though     2
## 118            thoughts     2
## 119                told     2
## 120                type     2
## 121              unique     2
## 122                want     2
## 123           wonderful     2
## 124             writing     2
## 125               years     2
## 126             academy     1
## 127          accelerate     1
## 128          activities     1
## 129            actually     1
## 130            adapting     1
## 131          adipiscing     1
## 132           adventure     1
## 133          adventures     1
## 134            afforded     1
## 135                 ago     1
## 136             aliquam     1
## 137             aliquet     1
## 138               along     1
## 139                alto     1
## 140            american     1
## 141                 amp     1
## 142            anything     1
## 143               apply     1
## 144       approximately     1
## 145                 art     1
## 146             asshole     1
## 147          attempting     1
## 148             attract     1
## 149           australia     1
## 150             average     1
## 151          background     1
## 152                ball     1
## 153                band     1
## 154               beach     1
## 155              become     1
## 156          behavioral     1
## 157              beyond     1
## 158             blessed     1
## 159               board     1
## 160                book     1
## 161          borderline     1
## 162               broad     1
## 163             brought     1
## 164                 btw     1
## 165            building     1
## 166            bupdateb     1
## 167              cattle     1
## 168              chance     1
## 169              change     1
## 170           character     1
## 171            charming     1
## 172             chinese     1
## 173               chose     1
## 174              circle     1
## 175       circumstances     1
## 176              cities     1
## 177                city     1
## 178               coast     1
## 179               cocky     1
## 180              coffee     1
## 181             college     1
## 182         comfortable     1
## 183              coming     1
## 184           community     1
## 185           conceited     1
## 186           consequat     1
## 187         considerate     1
## 188         conspicuous     1
## 189          continents     1
## 190      contradictions     1
## 191       conversations     1
## 192                cook     1
## 193             cooking     1
## 194                cool     1
## 195          coordinate     1
## 196          cordinated     1
## 197               corny     1
## 198               count     1
## 199              course     1
## 200                cras     1
## 201            creative     1
## 202             culture     1
## 203               cupid     1
## 204             cycling     1
## 205             cynical     1
## 206            cynicism     1
## 207                dark     1
## 208                dash     1
## 209                date     1
## 210           daysthats     1
## 211                deal     1
## 212               deals     1
## 213                deep     1
## 214          dependable     1
## 215         derivatives     1
## 216              design     1
## 217       developmental     1
## 218             diamond     1
## 219            dictated     1
## 220               diego     1
## 221          difference     1
## 222           different     1
## 223           difficult     1
## 224                dine     1
## 225              direct     1
## 226        disabilities     1
## 227               dives     1
## 228               dolor     1
## 229                done     1
## 230                dont     1
## 231                dork     1
## 232             drawing     1
## 233              dreams     1
## 234             drinkbr     1
## 235               drive     1
## 236                duis     1
## 237               early     1
## 238              earned     1
## 239                easy     1
## 240         economicsbr     1
## 241            educated     1
## 242             egestas     1
## 243                elit     1
## 244         encouraging     1
## 245                 end     1
## 246            endeavor     1
## 247               ended     1
## 248            engineer     1
## 249         engineering     1
## 250                enim     1
## 251              enough     1
## 252                 est     1
## 253             euismod     1
## 254                even     1
## 255            eventsbr     1
## 256          eventually     1
## 257                ever     1
## 258           excellent     1
## 259            exciting     1
## 260           exclusive     1
## 261              exotic     1
## 262              expect     1
## 263          experience     1
## 264          experiment     1
## 265             express     1
## 266                eyes     1
## 267           facilisis     1
## 268              fairly     1
## 269                fast     1
## 270         fatherstill     1
## 271            favorite     1
## 272            feelings     1
## 273               feels     1
## 274               felis     1
## 275             feugiat     1
## 276             figured     1
## 277            figuring     1
## 278              filled     1
## 279             finally     1
## 280            finished     1
## 281              firmly     1
## 282               first     1
## 283                 fit     1
## 284                five     1
## 285              follow     1
## 286               force     1
## 287              forces     1
## 288             forward     1
## 289               found     1
## 290          frequently     1
## 291            friendly     1
## 292           fringilla     1
## 293                game     1
## 294               games     1
## 295             germany     1
## 296                 get     1
## 297               getbr     1
## 298             getting     1
## 299               gives     1
## 300                grad     1
## 301            graduate     1
## 302           graduated     1
## 303        grouponesque     1
## 304              growth     1
## 305             guiding     1
## 306              guitar     1
## 307                guys     1
## 308                hang     1
## 309             hanging     1
## 310              happen     1
## 311             happier     1
## 312               happy     1
## 313                hard     1
## 314             hardest     1
## 315           harmonize     1
## 316                hate     1
## 317                hell     1
## 318                high     1
## 319              highly     1
## 320                hill     1
## 321                 hit     1
## 322                hive     1
## 323                hold     1
## 324            homework     1
## 325             honesty     1
## 326                hope     1
## 327           hopefully     1
## 328              hoping     1
## 329               house     1
## 330               howdy     1
## 331                hunt     1
## 332           hypocrisy     1
## 333                idea     1
## 334          idealistic     1
## 335              ignore     1
## 336          impossible     1
## 337           injustice     1
## 338             instead     1
## 339            interdum     1
## 340           internets     1
## 341       introspective     1
## 342                 ive     1
## 343                jobs     1
## 344               jokes     1
## 345      juuuuuuuuuuust     1
## 346                kewl     1
## 347                 key     1
## 348            kindness     1
## 349             lacking     1
## 350               laugh     1
## 351               learn     1
## 352                line     1
## 353         linguistics     1
## 354                list     1
## 355              little     1
## 356                live     1
## 357               lived     1
## 358               loads     1
## 359             located     1
## 360                long     1
## 361              longer     1
## 362            loserdom     1
## 363                 lot     1
## 364               loves     1
## 365              luxury     1
## 366             magical     1
## 367          magnetized     1
## 368             magnets     1
## 369               makes     1
## 370              making     1
## 371                 man     1
## 372                many     1
## 373            marriage     1
## 374           masculine     1
## 375              master     1
## 376               maybe     1
## 377                mean     1
## 378           mechanics     1
## 379            meditate     1
## 380          meditation     1
## 381              mellow     1
## 382             message     1
## 383            messages     1
## 384               might     1
## 385              mildly     1
## 386                mind     1
## 387              minded     1
## 388             minimum     1
## 389              minute     1
## 390              missed     1
## 391       misunderstood     1
## 392            momentbr     1
## 393             moments     1
## 394               money     1
## 395              months     1
## 396            monthsbr     1
## 397               morbi     1
## 398              mostly     1
## 399               movie     1
## 400              movies     1
## 401            mushroom     1
## 402             musical     1
## 403            mutually     1
## 404                name     1
## 405            napoleon     1
## 406              nature     1
## 407              nearly     1
## 408                need     1
## 409                news     1
## 410           nicaragua     1
## 411                nisi     1
## 412                 non     1
## 413           nonprofit     1
## 414               north     1
## 415                odio     1
## 416          officially     1
## 417               often     1
## 418                 okc     1
## 419             okcupid     1
## 420                 old     1
## 421                ones     1
## 422             oneself     1
## 423            optimism     1
## 424          originally     1
## 425              ornare     1
## 426              others     1
## 427           otherwise     1
## 428             outcome     1
## 429             packing     1
## 430             padding     1
## 431                palo     1
## 432           paragraph     1
## 433                part     1
## 434               parts     1
## 435                path     1
## 436               peeps     1
## 437           peninsula     1
## 438            perceive     1
## 439           perfectly     1
## 440              person     1
## 441         perspective     1
## 442         photography     1
## 443            pictures     1
## 444               place     1
## 445                play     1
## 446              please     1
## 447 pleasurablestrongbr     1
## 448              postal     1
## 449              prefer     1
## 450             process     1
## 451            programs     1
## 452            projects     1
## 453             promise     1
## 454            pulvinar     1
## 455                puns     1
## 456                quam     1
## 457             quarter     1
## 458             quickly     1
## 459               quite     1
## 460              quotes     1
## 461              rather     1
## 462             reading     1
## 463              reason     1
## 464               rebel     1
## 465              recent     1
## 466              refuse     1
## 467          regardless     1
## 468        relationship     1
## 469         repairteach     1
## 470            replying     1
## 471            research     1
## 472             respond     1
## 473                ride     1
## 474                risk     1
## 475                rock     1
## 476             rolling     1
## 477            romantic     1
## 478                root     1
## 479               rough     1
## 480          roundworld     1
## 481            rustling     1
## 482             sailing     1
## 483                salt     1
## 484           sarcastic     1
## 485               savvy     1
## 486               saybr     1
## 487         scelerisque     1
## 488             science     1
## 489              search     1
## 490             seattle     1
## 491             section     1
## 492                self     1
## 493        selfabsorbed     1
## 494          selfdriven     1
## 495               sense     1
## 496                shit     1
## 497             shopsbr     1
## 498           silencebr     1
## 499              single     1
## 500              sitebr     1
## 501             sitting     1
## 502          situations     1
## 503              sketch     1
## 504              slowly     1
## 505               smart     1
## 506              social     1
## 507            software     1
## 508              solely     1
## 509            somebody     1
## 510           something     1
## 511           sometimes     1
## 512               sorry     1
## 513               sound     1
## 514             special     1
## 515               spent     1
## 516               spots     1
## 517            stanford     1
## 518             startup     1
## 519                stay     1
## 520         stimulating     1
## 521              strong     1
## 522        strongstrong     1
## 523    strongunexpected     1
## 524             success     1
## 525              summer     1
## 526               swear     1
## 527               sweet     1
## 528             sweeter     1
## 529               takes     1
## 530              taking     1
## 531                 tap     1
## 532            teaching     1
## 533        teleprompter     1
## 534              tellus     1
## 535              tempor     1
## 536         temporarily     1
## 537                tend     1
## 538              tennis     1
## 539               terms     1
## 540               thats     1
## 541          thoughtful     1
## 542               three     1
## 543                tiny     1
## 544               title     1
## 545                 top     1
## 546            totaling     1
## 547               trade     1
## 548       transcendence     1
## 549              travel     1
## 550          treasuring     1
## 551               tries     1
## 552                trip     1
## 553               trips     1
## 554           tristique     1
## 555               trust     1
## 556              turpis     1
## 557              update     1
## 558               value     1
## 559               vapid     1
## 560              varius     1
## 561               views     1
## 562             villain     1
## 563            visiting     1
## 564               vitae     1
## 565             vivamus     1
## 566        volunteering     1
## 567            volutpat     1
## 568                wait     1
## 569             waiting     1
## 570                wake     1
## 571              wanted     1
## 572               wants     1
## 573            watching     1
## 574                 way     1
## 575                webs     1
## 576                well     1
## 577              wellbr     1
## 578            whatever     1
## 579             whoever     1
## 580                wide     1
## 581                wish     1
## 582               woman     1
## 583               women     1
## 584             womenbr     1
## 585              wonder     1
## 586              workbr     1
## 587              worked     1
## 588          workerauto     1
## 589               worry     1
## 590               worst     1
## 591               write     1
## 592                 yet     1
## 593                yoga     1
## 594               youth     1
## 595            zanzibar     1
ggplot(sfq_men[1:20,], mapping = aes(x = reorder(words, freqs), y = freqs)) +
  geom_bar(stat= "identity", fill="#659ad5") +
  coord_flip() +
  scale_colour_hue() +
  labs(x= "Words", title = "20 Most Frequent Words (Essay0 Subset for Men)") +
  theme(panel.background = element_blank(), axis.ticks.x = element_blank(),axis.ticks.y = element_blank())

library(wordcloud)
wordcloud(sfq_men$words,sfq_men$freqs, min.freq = 1, max.words = 30, colors="#659ad5")

According to the plots and word cloud created for men the most frequent word used is ‘life’. The word love isn’t even ranked in the frequent words plot. We see more words like ‘new’, ‘things’, ‘looking’, and like. I believe that this means that the men on OkCupid are not looking for a serious commitment.

Conclusion:

In conclusion, the analysis showed that the sentiments of essay0 were the same for both men and women. This was surprising to me since I expected women to sound more positive in their essays than men. The sentiment analysis did not give me results I expected so I then continued to analyze the frequency of words used in their essay. I found that women tend to use the word ‘love’ more while men did not use the word at all in the sample. I think this shows that women on OkCupid are looking for more of a commitment since they also use the word ‘family’ and ‘years’ while men use words like ‘new’ and ‘things’. However, this might not be enough to conclude commitment since I am basing this off of what I believe commitment sounds like. Women may type more words in general or have a different underlying variable that determines why women say the word love more than men. This could be an important question that requires further analysis.